How to use Ollama with your university’s HPC cluster

MacOS

Ollama

In this guide, I will walk through how to use Ollama with a university’s HPC cluster

Author

Wei Miao

Published

February 16, 2025

Modified

April 21, 2025

1 Pre-requisites

You have access to your university’s HPC cluster. For instance, as a UCL member of staff, I can apply for an account to use the UCL HPC cluster through this link.
You have a working knowledge of how to use the HPC cluster. For instance, you should be comfortable with using shell commands, submitting jobs, and managing files on the cluster. If you are new to the HPC cluster, you can refer to the UCL HPC documentation.
You have a working knowledge of how to use Ollama in the format of images.
You should have access to the university’s HPC login node. It’s likely that you would need a VPN connection to access the HPC cluster from outside the university network.

2 Walkthrough of setting up Ollama on the UCL HPC cluster: The Container Image Method

In this first method, I will show you how to set up Ollama on the UCL HPC cluster using the container image method. This is the most straightforward method to set up Ollama on the UCL HPC cluster.

Why container image method? Because different University HPC clusters have different configurations (different Linux distros etc.), and the container image method is the most portable way to set up Ollama on the UCL HPC cluster. You can think of the container image as a pre-packaged version of Ollama that contains all the necessary dependencies and configurations to run Ollama on the UCL HPC cluster.
To be updated in the future: I will also show you how to set up Ollama on the UCL HPC cluster using the source code method. This method is more flexible and allows you to customize the installation of Ollama on the UCL HPC cluster. However, my experience with UCL’s HPC cluster is not good because the cluster uses very old linux version, so some dependencies (especially glibc) are not compatible with the latest version of Ollama.

2.1 Step 1: Connect to the UCL HPC cluster

First, you would need to have access to UCL’s internal network. You can do this by connecting to the UCL VPN. Alternatively, if you are on campus, you can connect to the UCL network directly.

Then, you need to SSH into the UCL HPC cluster.¹. Depending on your OS, you can launch the terminal and type the following command. You can refer to this page for more information on how to connect to the UCL HPC cluster.

ssh UCL_ID@myriad.rc.ucl.ac.uk

Note

ssh is the command to initiate an SSH connection to a remote server.
UCL_ID is your UCL user ID, e.g., ucabxyz.
myriad.rc.ucl.ac.uk is the hostname of the UCL HPC cluster.
UCL_ID\@myriad.rc.ucl.ac.uk is the full address to connect to the UCL HPC cluster, meaning you are connecting to the myriad.rc.ucl.ac.uk server with the UCL_ID account.

You will be prompted to enter your password. You won’t see the password as you type it, but it’s being entered. Press Enter after you’ve typed your password.

If this is your first time connecting to the UCL HPC cluster, you will be prompted to accept the RSA key fingerprint. You can type yes and press Enter to accept the RSA key fingerprint.

Once you are connected to the UCL HPC cluster, you will see the following screen:

2.2 Step 2: Load the necessary modules

Once you are connected to the UCL HPC cluster, you need to load the necessary modules to use Ollama. In this guide, I’m using apptainer to pull the Ollama container image from the Docker Hub. Apptainer is a tool to manage container images on the UCL HPC cluster. You may have heard of Docker before, and apptainer is a similar tool to manage container images on HPC clusters.

Because UCL’s cluster has a module system, you need to load the apptainer module before you can use it. You can do this by typing the following command. For more information on how to use apptainer on UCL’s cluster, you can refer to this page.

module load apptainer

# Create a directory to store the Ollama models
mkdir -p ~/Scratch/ollama/models

Next, we need to set some environment variables to make the apptainer tool work for Ollama. You can copy paste the following command into your terminal and hit enter to set the environment variables.

Note

mkdir -p is a command to create a directory. The -p flag is used to create the parent directory if it doesn’t exist.
~/Scratch/ollama/models is the path to the Ollama models you will download later. For UCL users, on our disk on the cluster, we have a directory called Scratch where we can store large files. We will store the Ollama models in the Scratch directory.

# This is the path to the Ollama models you will download later
export OLLAMA_MODELS="~/Scratch/ollama/models"

# This is the path to the Ollama logs. You can change the log level to "debug" to see more logs
export OLLAMA_LOG_LEVEL="error"

Note

export is a command to set environment variables in the shell.
OLLAMA_MODELS is the environment variable to set the path to the Ollama models. You can change the path to your preferred location.
OLLAMA_LOG_LEVEL is the environment variable to set the log level of Ollama. You can change the log level to debug to see more logs. “error” will only show error logs.

2.3 Step 3: Pull the Ollama container image

In this tutorial, we will not use apptainer build to build the Ollama container image from scratch. Instead, we would pull the Ollama container image from the Docker Hub.

The most recent version of the image is tagged as ollama/ollama:latest. You can type the following command to pull the Ollama container image from the Docker Hub.

apptainer pull ollama-latest.sif docker://ollama/ollama:latest

After running the command, the apptainer tool will pull the Ollama container image from the Docker Hub and convert it as ollama-latest.sif in the current directory. Internally, it will download the image to a temporary location and then save it as ollama-latest.sif in the current directory.

If you would like to pull a specific version of the Ollama container image, you can specify the tag. For instance, if you would like to pull the Ollama container image tagged as 0.5.11 (which is the latest version as of Feb 16, 2025), you can type the following command.

apptainer pull ollama-0.5.11.sif docker://ollama/ollama:0.5.11

Note

apptainer pull is a command to pull the Ollama container image from the Docker Hub.
ollama-0.5.11.sif is the name of the Ollama container image to be saved in the current directory.
docker://ollama/ollama:0.5.11 is the address of the Ollama container image on the Docker Hub. The image is tagged as 0.5.11.

2.4 Step 4: Run Ollama on the UCL HPC cluster

Once the image is pulled, you can run Ollama on the UCL HPC cluster. You can type the following command to run Ollama on the UCL HPC cluster.

apptainer run --nv ~/ollama-0.5.11.sif &

Note

apptainer run is a command to run the Ollama container image on the UCL HPC cluster.
--nv is a flag to enable GPU inference. If you don’t have access to GPU inference, you can remove this flag.
~/ollama-0.5.11.sif is the path to the Ollama container image you pulled earlier. Note that this is the path to the pulled image we have saved in the current directory.
& is a command to run the Ollama container image in the background. This way, you can continue to use the terminal while Ollama is running in the background.

Note that by default, if you are at the login node (you will see userid\@login at your terminal prompt), you won’t have access to GPU inference. Therefore, you can only run Ollama in CPU mode, and you will see the following warning message.

WARNING: Could not find any nv files on this host!

If you would like to run Ollama in GPU mode, you will either need to:

request an interactive session on the GPU nodes. Refer to the UCL HPC documentation on interactive sessions for more information on how to request an interactive session on the GPU nodes.
submit a GPU job to the GPU nodes. Refer to the UCL HPC documentation on GPU nodes for more information on how to submit a job to the GPU nodes.

Now that the Ollama service is run in the background, you need to download some Ollama models to the UCL HPC cluster.

For instance, you can type the following command to download the qwen2.5:14b model.


apptainer run --nv ~/ollama-0.5.11.sif pull qwen2.5:14b

Note

apptainer run --nv ~/ollama-0.5.11.sif can be thought of as ollama run if you are familiar with the ollama command. It is a command to run the Ollama container image on the UCL HPC cluster.
pull qwen2.5:14b is a command to download the qwen2.5:14b model to the UCL HPC cluster. The model will be saved in the OLLAMA_MODELS directory you set earlier.
If you tried ollama on your laptop before, this is similar to ollama pull qwen2.5:14b to download the model to your local machine.
Therefore, in order to enter the chat mode with Ollama, you can type the following command: apptainer run --nv ~/ollama-0.5.11.sif run qwen2.5:14b.

You can test Ollama is running by typing the following command. For how to use the API, you can refer to the Ollama API documentation.

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

2.5 Step 5: Call Ollama API with your preferred language

Since it’s very likely that you will use Ollama in your own programming environment, you can call Ollama with your preferred programming language. For instance, you can use R/Python to classify whether a Twitter text contains hate speech

You can also call Ollama with your preferred programming language. Say, You can use R/Python to classify whether a Twitter text contains hate speech.

2.6 Conclusion

The above steps show you a quick example of how to use Ollama with the UCL HPC cluster. You can replace the toy example with your own use case. You can also streamline the process by writing a shell script to automate the process of loading the modules, pulling the Ollama container image, and running Ollama on the UCL HPC cluster.

Below is my example shell script to automate the process of loading the modules, pulling the Ollama container image, and running Ollama on the UCL HPC cluster.

#!/bin/bash -l

#$ -l h_rt=48:00:0
#$ -l mem=32G
#$ -l gpu=1
#$ -l tmpfs=10G
#$ -N find_company_matches
#$ -wd ~/Scratch/Accounting-Marketing
#$ -m be
#$ -M wei.miao@ucl.ac.uk
#$ -t 1-3

# Load the R module and run your R program
# source /shared/ucl/apps/bin/defmods
export OLLAMA_MODELS="~/Scratch/ollama/models"
export R_LIBS_USER="~/R/x86_64-pc-linux-gnu-library/4.4"
export GIN_MODE="release"
export OLLAMA_LOG_LEVEL="error"

module -f unload compilers mpi gcc-libs
module load curl/7.86.0/gnu-4.9.2
module load r/4.4.2-openblas/gnu-10.2.0
module load apptainer

apptainer run --nv ~/ollama-0.5.11.sif &
apptainer run ~/ollama-0.5.11.sif pull qwen2.5:7b

export WORK_DIR="~/Scratch/"

# below is the R script to run
cd $TMPDIR
R --no-save < $WORK_DIR/shell/find_company_name_matches.R > $JOB_NAME$SGE_TASK_ID.out

# Copy the output files back to the current directory
tar zcvf $WORK_DIR/shell/files_from_job_$JOB_NAME$SGE_TASK_ID.tgz $TMPDIR

Enjoy using Ollama on the UCL HPC cluster!

Footnotes

SSH stands for Secure Shell. It is widely used for remote login to computer systems by users. As long as you have the user ID and your user password, you will be able to SSH into the cluster.↩︎